Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
نویسندگان
چکیده
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 14 April 2020Accepted: 01 March 2021Published online: 05 October 2021Keywordstemporal difference learning, Polyak--Ruppert averaging, variance reductionAMS Subject Headings68Q25, 68R10, 68U05Publication DataISSN (online): 2577-0187Publisher: Society for Industrial and Applied MathematicsCODEN: sjmdaq
منابع مشابه
Analysis of Temporal-Difference Learning
We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a ch...
متن کاملInstance Optimal Learning
We consider the following basic learning task: given independent draws from an unknowndistribution over a discrete support, output an approximation of the distribution that is as ac-curate as possible in `1 distance (equivalently, total variation distance, or “statistical distance”).Perhaps surprisingly, it is often possible to “de-noise” the empirical distribution of the samples<lb...
متن کاملAn Analysis of Temporal-Difference Learning with Function Approximation
We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with pr...
متن کاملBayes Optimal Instance-Based Learning
In this paper we present a probabilistic formalization of the instance-based learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a data-driven instance-based learning approach, is equivalent to averaging over all the (possibly innnitely many) individual models. The general Bayesian instance-based learning framework described in this paper can ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SIAM journal on mathematics of data science
سال: 2021
ISSN: ['2577-0187']
DOI: https://doi.org/10.1137/20m1331524